59 research outputs found

    Fast Data Analytics by Learning

    Full text link
    Today, we collect a large amount of data, and the volume of the data we collect is projected to grow faster than the growth of the computational power. This rapid growth of data inevitably increases query latencies, and horizontal scaling alone is not sufficient for real-time data analytics of big data. Approximate query processing (AQP) speeds up data analytics at the cost of small quality losses in query answers. AQP produces query answers based on synopses of the original data. The sizes of the synopses are smaller than the original data; thus, AQP requires less computational efforts for producing query answers, thus can produce answers more quickly. In AQP, there is a general tradeoff between query latencies and the quality of query answers; obtaining higher-quality answers requires longer query latencies. In this dissertation, we show we can speed up the approximate query processing without reducing the quality of the query answers by optimizing the synopses using two approaches. The two approaches we employ for optimizing the synopses are as follows: 1. Exploiting past computations: We exploit the answers to the past queries. This approach relies on the fact that, if two aggregation involve common or correlated values, the aggregated results must also be correlated. We formally capture this idea using a probabilistic distribution function, which is then used to refine the answers to new queries. 2. Building task-aware synopses: By optimizing synopses for a few common types of data analytics, we can produce higher quality answers (or more quickly for certain target quality) to those data analytics tasks. We use this approach for constructing synopses optimized for searching and visualizations. For exploiting past computations and building task-aware synopses, our work incorporates statistical inference and optimization techniques. The contributions in this dissertation resulted in up to 20x speedups for real-world data analytics workloads.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/138598/1/pyongjoo_1.pd

    AirIndex: Versatile Index Tuning Through Data and Storage

    Full text link
    The end-to-end lookup latency of a hierarchical index -- such as a B-tree or a learned index -- is determined by its structure such as the number of layers, the kinds of branching functions appearing in each layer, the amount of data we must fetch from layers, etc. Our primary observation is that by optimizing those structural parameters (or designs) specifically to a target system's I/O characteristics (e.g., latency, bandwidth), we can offer a faster lookup compared to the ones that are not optimized. Can we develop a systematic method for finding those optimal design parameters? Ideally, the method must have the potential to generate almost any existing index or a novel combination of them for the fastest possible lookup. In this work, we present new data and an I/O-aware index builder (called AirIndex) that can find high-speed hierarchical index designs in a principled way. Specifically, AirIndex minimizes an objective function expressing the end-to-end latency in terms of various designs -- the number of layers, types of layers, and more -- for given data and a storage profile, using a graph-based optimization method purpose-built to address the computational challenges rising from the inter-dependencies among index layers and the exponentially many candidate parameters in a large search space. Our empirical studies confirm that AirIndex can find optimal index designs, build optimal indexes within the times comparable to existing methods, and deliver up to 4.1x faster lookup than a lightweight B-tree library (LMDB), 3.3x--46.3x faster than state-of-the-art learned indexes (RMI/CDFShop, PGM-Index, ALEX/APEX, PLEX), and 2.0 faster than Data Calculator's suggestion on various dataset and storage settings.Comment: 13 pages, 3 appendices, 19 figures, to appear at SIGMOD 202

    Unified Hierarchical Relationship Between Thermodynamic Tradeoff Relations

    Full text link
    Recent years have witnessed a surge of discoveries in the studies of thermodynamic inequalities: the thermodynamic uncertainty relation (TUR) and the entropic bound (EB) provide a lower bound on the entropy production (EP) in terms of nonequilibrium currents; the classical speed limit (CSL) expresses the lower bound on the EP using the geometry of probability distributions; the power-efficiency (PE) tradeoff dictates the maximum power achievable for a heat engine given the level of its thermal efficiency. In this study, we show that there exists a unified hierarchical structure encompassing all of these bounds, with the fundamental inequality given by a novel extension of the TUR (XTUR) that incorporates the most general range of current-like and state-dependent observables. By selecting more specific observables, the TUR and the EB follow from the XTUR, and the CSL and the PE tradeoff follow from the EB. Our derivations cover both Langevin and Markov jump systems, with the first proof of the EB for the Markov jump systems and a more generalized form of the CSL. We also present concrete examples of the EB for the Markov jump systems and the generalized CSL.Comment: 19 pages, 4 figure

    ElasticNotebook: Enabling Live Migration for Computational Notebooks (Technical Report)

    Full text link
    Computational notebooks (e.g., Jupyter, Google Colab) are widely used for interactive data science and machine learning. In those frameworks, users can start a session, then execute cells (i.e., a set of statements) to create variables, train models, visualize results, etc. Unfortunately, existing notebook systems do not offer live migration: when a notebook launches on a new machine, it loses its state, preventing users from continuing their tasks from where they had left off. This is because, unlike DBMS, the sessions directly rely on underlying kernels (e.g., Python/R interpreters) without an additional data management layer. Existing techniques for preserving states, such as copying all variables or OS-level checkpointing, are unreliable (often fail), inefficient, and platform-dependent. Also, re-running code from scratch can be highly time-consuming. In this paper, we introduce a new notebook system, ElasticNotebook, that offers live migration via checkpointing/restoration using a novel mechanism that is reliable, efficient, and platform-independent. Specifically, by observing all cell executions via transparent, lightweight monitoring, ElasticNotebook can find a reliable and efficient way (i.e., replication plan) for reconstructing the original session state, considering variable-cell dependencies, observed runtime, variable sizes, etc. To this end, our new graph-based optimization problem finds how to reconstruct all variables (efficiently) from a subset of variables that can be transferred across machines. We show that ElasticNotebook reduces end-to-end migration and restoration times by 85%-98% and 94%-99%, respectively, on a variety (i.e., Kaggle, JWST, and Tutorial) of notebooks with negligible runtime and memory overheads of <2.5% and <10%.Comment: Accepted to VLDB 202

    All-rounder: A flexible DNN accelerator with diverse data format support

    Full text link
    Recognizing the explosive increase in the use of DNN-based applications, several industrial companies developed a custom ASIC (e.g., Google TPU, IBM RaPiD, Intel NNP-I/NNP-T) and constructed a hyperscale cloud infrastructure with it. The ASIC performs operations of the inference or training process of DNN models which are requested by users. Since the DNN models have different data formats and types of operations, the ASIC needs to support diverse data formats and generality for the operations. However, the conventional ASICs do not fulfill these requirements. To overcome the limitations of it, we propose a flexible DNN accelerator called All-rounder. The accelerator is designed with an area-efficient multiplier supporting multiple precisions of integer and floating point datatypes. In addition, it constitutes a flexibly fusible and fissionable MAC array to support various types of DNN operations efficiently. We implemented the register transfer level (RTL) design using Verilog and synthesized it in 28nm CMOS technology. To examine practical effectiveness of our proposed designs, we designed two multiply units and three state-of-the-art DNN accelerators. We compare our multiplier with the multiply units and perform architectural evaluation on performance and energy efficiency with eight real-world DNN models. Furthermore, we compare benefits of the All-rounder accelerator to a high-end GPU card, i.e., NVIDIA GeForce RTX30390. The proposed All-rounder accelerator universally has speedup and high energy efficiency in various DNN benchmarks than the baselines

    Electroless Gold Plating on Aluminum Patterned Chips for CMOS-based Sensor Applications

    Get PDF
    We presented an approach for the activation of aluminum Al alloy using palladium Pd and the subsequent gold Au electroless plating ELP for complementary metal oxide semiconductor CMOS -based sensor applications. In this study, CMOS process compatible Al patterned chips were used as substrates for easy incorporation with existing CMOS circuits. To improve the contact resistance that arose from the Schottky barrier between the metal electrodes and the single-walled carbon nanotubes SWCNTs , electroless deposition of gold that has a higher work function than Al was adopted because the SWCNTs has p-type semiconductor properties. Each step of the Au ELP procedure was studied under various bath temperatures, immersion times, and chemical concentrations. Fine Pd particles were homogeneously distributed on the Al surface by the Pd activation process at room temperature. Au ELP allowed selective deposition of the Au film on the activated Al surface only. The SWCNT networks formed on the Au plated chip by a dip-coating method showed improved contact resistance and resistance variation between the Au electrode and SWCNTs. We also tried SWCNT decoration with the Au particle using the upper Au ELP method, which was expected to be applied in various areas including field-effect transistors and sensor devices.This work was supported by the Nano Systems Institute-National Core Research Center NSI-NCRC program of NRF and the TDPAF, Ministry for Agriculture, Forestry and Fisheries, Republic of Korea
    corecore